A Parallel 3-D FFT Algorithm on Clusters of Vector SMPs

نویسنده

  • Daisuke Takahashi
چکیده

In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of vector symmetric multiprocessor (SMP) nodes. The three-dimensional FFT algorithm can be altered into a multirow FFT algorithm to expand the innermost loop length. We use the multirow FFT algorithm to implement the parallel three-dimensional FFT algorithm. Performance results of three-dimensional power-of-two FFTs on clusters of (pseudo) vector SMP nodes, Hitachi SR8000, are reported. We succeeded in obtaining performance of about 40 GFLOPS on a 16-node Hitachi SR8000.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Cone-Beam Tomographic Reconstruction Using LogSMP: An Extenced LogP Model for Clusters of SMPs

The tomographic reconstruction for cone-beam geometries is a computationally intensive task requiring large memory and computational power to investigate interesting objects. The analysis of its parallel implementation on widely available clusters of SMPs requires an extension of the original LogP model to account for the various communication channels, called LogSMP. The LogSMP model is used i...

متن کامل

Modeling and Simulative Performance Analysis of SMP and Clustered Computer Architectures

The performance characteristics of several classes of parallel computing systems are analyzed and compared using high-fidelity modeling and execution-driven simulation. Processor, bus, and network models are used to construct and simulate the architectures of symmetric multiprocessors (SMPs), clusters of uniprocessors, and clusters of SMPs. To demonstrate a typical use of the models, the perfor...

متن کامل

Technische Universität Chemnitz Sonderforschungsbereich 393 Numerische Simulation auf massiv parallelen Rechnern

The characteristics of irregular algorithms make a parallel implementation difficult, especially for PC clusters or clusters of SMPs. These characteristics may include an unpredictable access behavior to dynamically changing data structures or strong irregular coupling of computations. Problems are an unknown load distribution and expensive irregular communication patterns for data accesses and...

متن کامل

Combining building blocks for parallel multi-level matrix multiplication

EXTENDED ABSTRACT Matrix-matrix multiplication is one of the core computations in many algorithms from scientific computing or numerical analysis and many efficient realizations have been invented over the years, including many parallel ones. The current trend to use clusters of PCs or SMPs for scientific computing suggests to revisit matrix-matrix multiplication and investigate efficiency and ...

متن کامل

Performance Analysis of Algorithms on Shared Memory, Message passing and Hybrid Models for Standalone and Clustered SMPs

While algorithms are well-understood in its sequential form, comparatively little would be known of how to implement parallel algorithms with main-stream parallel programming platforms and run it on SMP-based mainstream systems such as multi-core clusters. The project aims at better understanding the algorithmic techniques like divide and conquer, decrease and conquer, transform and conquer par...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000